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The methods of system identification and spectral analysis are well 
documented in the literature. In this paper, we attempt to merge the 
methods of least- square system identification and short-time Fourier 
transform spectral estimation. Starting from the least-squares normal 
equations for a linear system identification problem and expanding 
the signals in short- time Fourier transforms, we derive a Toeplitz 
system of equations, the solution of which approximates the original 
least-squares equation solution. We then bound the error norm be- 
tween the two solution methods and show the properties of the error 
by numerical methods. The resulting "spectral" estimation method is 
shown to completely remove the bias normally associated with pre- 
viously proposed spectral estimation procedures. The method appears 
to be particularly useful when one is interested in linear system 
identification of very large systems (long impulses response) or for 
system identification in the presence of nonstationary (e.g., burst) 
noise. Extensive numerical results are included. 

I. INTRODUCTION 

Although time-domain methods have been in use for system iden- 
tification and modeling for at least 35 years, no simple, robust proce- 
dure has been proposed which uses short-time Fourier transform 
methods. Classical spectral analysis methods are generally inadequate 
for all but the simplest cases because of their unsatisfactory properties. 
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Fig. 1 — Block diagram of the system identification model. 



Recent results in the theory of short-time spectral analysis have 
suggested a framework for a new method of spectral estimation and 
system identification. 1 " 6 It is the purpose of this paper to describe the 
theory of this new algorithm and to compare and contrast it to three 
alternative procedures of system identification. In this paper we at- 
tempt to clear up many questions raised by an earlier related paper. 1 
Figure 1 is a block diagram of the general system identification 
model. The input signal x(n) is assumed to be zero mean white noise 
having variance a\. The linear system h(n) is assumed to be a finite 
impulse response (fir) system of duration M samples, 

h(n) = 0, n<0,n>M-l. (1) 

At the output of the linear system, an independent white noise q(n) 
(having zero mean and variance a 2 q ) is added to v(n) to give the output 
signal v(n). Thus 

y(n) = x(n)*h(n) + q(n) (2a) 

Af-l 

= X h(m)x(n-m) + q(n). (2b) 

m— 

The signal-to-noise ratio (s/n) at the output of the system is defined 
as 

where E is the expectation operation. 

The system identification problem is one of finding an estimate h(n) 
of h(n), given only N samples of the input x(n) and the output v(n). It 
is assumed that the duration of h{n) (call this M) satisfies the relation 

M 2* M, (4) 

i.e., that we have knowledge of, or can accurately bound, the duration 
of the system impulse response. In general, we assume that N ~» M. 
Assuming the constraint of eq. (4) is obeyed, one reasonable measure 
of system performance is the quantity 

1744 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1979 



Q = 10 log! 



I [h(m) - fi(m)f 

m-0 

I h 2 (m) 

m=0 



(5) 



The quantity Q is called the "misadjustment" or "misalignment" 
between h(n) and /i(n). 

Several classes of techniques are known in the time domain for 
solving the system identification problem, the most important of which 
is the classical least-squares analysis (lsa) method. In the frequency 
domain, however, only very simple, suboptimal techniques have been 
proposed for solving the system identification problem, and these 
techniques have not proven to be entirely adequate for any reasonable 
class of problems. 5,7 In this paper, we derive a new short-time Fourier 
transform domain approach to the system identification problem 
which alleviates many of the problems encountered using previously 
proposed frequency domain methods. High-quality frequency domain 
techniques are interesting for many reasons, but several notable ones 
are that (i) very large systems may be estimated (M = 10 3 points), (ii) 
fft methods are numerically very efficient, (Hi) ill-conditioned prob- 
lems are naturally identified, and (iv) the coherence function may be 
computed and used adaptively to dynamically modify the analysis 
procedures in a data-dependent way (a form of nonlinear analysis). 11 
Furthermore, with the advent of high-speed array processors, algo- 
rithmic procedures which use ffts are frequently easily implemented. 

II. LEAST-SQUARES SOLUTION 

Several basic results are necessary before we describe our frequency 
domain method. Since our approach is based on the method of least- 
squares analysis (lsa), we define that procedure first. 

In the lsa method, one minimizes the quantity 



1= I (y(n) - y(n))\ 

n-M-\ 



(6) 



where 



Af-I 



and 



y(n) = £ h(m)x(n - m) + q(n) 
= h*x + q(n) 



Af-l 

y(n) = 5] fi(m)x(n — m) = fi*x. 



(7) 



(8) 
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Taking partials of / with respect to h(l), the unknowns, for ^ Z 
s* M — 1 results in the set of equations 



N-\ 



l (y(n) - y(n))x(n - I) = 0, *s / *£ M - 1, (9) 



or in terms of x and v 



£ h(m) £ x(n - l)x(n - m) = £ y(n)x(n - I), ^ / =s M - 1, (10) 



m-0 



where in all cases £„ implies a sum from M - 1 to N - 1, where N is 
the total number of data points required by the analysis. Our limits on 
n have been chosen so that the first and last required points of data 
are *(0) and x(N - 1). 
We now define 

<t>d, m) = X x(n - l)x(n - m) (11) 

n 

r(l) = l l y(n)x(n-l) (12) 

n 

(f, = [<*>(/, m)] (13) 

r = [r(D] (14) 

h = [h(l)l (15) 

Using this notation, eq. (10) becomes the matrix equation 

<f>h = r. (16) 

Our approach will be to approximate eq. (16) by a Toeplitz matrix 
equation which may be solved by one of the Toeplitz inversion meth- 
ods 8,9 or approximately by dft methods. In general, the solution of eq. 
(11) would be the optimal approach; however, when h is very large 
(i.e., M = 10 3 ), the computation, storage, and solution of eq. (11) is 
totally impractical. Under these conditions, the methods of this paper 
might be useful. 

III. OVERLAP-ADD EXPANSIONS 

The key to our method is the overlap-add expansion of a signal 
based on the following identity: 



R-l 



V w(mR -n) = (1/R) £ W(e' {2 " /R)p )e- J{2 * /R)np , (17) 



p=0 



where W{z) is the z transform of w(n) and m, R, n, p are integers. 
Equation (17) is the discrete version of the Poisson sum formula. 2 If 
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we now assume that w(n) is a time-limited lowpass function, such as 
a Hamming or Kaiser window, and R is chosen such that e J2 " /R is 
greater than the cutoff frequency of W{z), then the following approx- 
imate relation holds: 

(R/Wie*)) £ ^(mR - n) = 1, (18) 

with an error determined by the out-of-band energy in the window 
w{n). For any reasonable window, the error is negligible. 2 If we define 
D = W(e j0 )/R and multiply eq. (18) by any signal x(n), we obtain the 
overlap-add expansion of x(n): 

x(n)= X *»(»). (19) 

m =— oc 

where 

Xm(n) = — w{mR — n)x(n). (20) 

The Fourier transform of x m (n) is called the "short-time" Fourier 
transform. 3-4 Expansions of <> and r by use of overlap-add expansions 
of x(n) and y(n) are possible through straightforward application of 
eqs. (19) and (11) and (12): 



*(/,m)«2 X Xk(n-l) S x p (n-m) 

n A=-« p— « 



-II <t> P M,m), (21) 

p— *— 

where we have defined 

<t> P k(l, m) = I **(rc - /)* p (n _ m ) 

= jpl,x(n- l)w(kR + l-n) 



and 



• x(n- m)w(pR + m- n) (22) 

rd)-Eytoxtii-l) 



= I X I y/»(»)x*(n - I) 

n p=—oo k=— oe 
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- I 2 r pk (l), 

p—ca A— oo 



(23) 



where 



r pk (l) = Zy P (n)x k (n-l) 



= 7p I y(n)w(pR - n)x(n - l)w(kR + I - n). (24) 

In Fig. 2 we show the relative window displacements for the term 
<f> P k(l, m) of eq. (22) assuming a window L points long. Due to the time 
truncation of the windows, the sum on n does not extend beyond the 
interval N A ^n^N B , where 



N A = max(kR + l,pR + m) + I - L 
N B = mm(kR + l,pR + m). 



(25) 



The situation is identical for the case of r pk (l) if m is set to zero in eq. 

(25). 

When implementing the sums onp and k, it is frequently convenient 
to make a linear transformation of variables from k to q of the form 
k = p + q. In these variables, N A and N B may be written as 



N A (p, q)=pR-L+l + max(m, qR + /) 
N B {p, q) =pR + min(m, qR + I). 



(26) 

(27) 



IV. SHORT-TIME SPECTRAL APPROACH TO SYSTEM IDENTIFICATION 

In this section, we show how to split the lsa matrix equation <£n = 
r into the sum of a Toeplitz matrix and an error matrix. The Toeplitz 
matrix may be evaluated in the frequency domain and inverted by 
Toeplitz matrix inversion methods (or approximately by dft methods). 
The error matrix will be shown to decrease (relative to the Toeplitz 
part) as 1/N, where N is the number of data points. Thus as N 
increases, the error in the solution (relative to the lsa solution) will 
decrease at the rate of 6 dB per octave as the number of data points 



kR + «>pR + m 




pR + m - L+ 1 

kR + B- L+ 1 

Fig. 2 — Relative position of the windows for the matrix element <fcp*. 
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increases. Significant errors in n due to the additive noise q{n) are also 
present when the s/n is small, and these errors decrease at the rate of 
3 dB per octave as N increases. 6 Thus, for large enough N, the 
truncation errors due to our approximation of the lsa matrix equation 
will be less than those due to the additive noise q(n). Under these 
conditions, the Toeplitz estimate will be as accurate as the lsa result. 
We split eq. (11) as follows 

$ = <j> + € 

r = r + fi 

h = & + A, (28) 

where c is the non-Toeplitz part of <£, </> is Toeplitz and symmetric (i.e., 
<f> = [<f>(/ — m)] = [</>(m — I)]), and n satisfies the equation 

<£n - r. (29) 

A is the error between the lsa solution and the Toeplitz solution eq. 
(29). By bounding the norm of A, || A || = || 6 — h ||, we can evaluate the 
error introduced by our procedure. 

To obtain an expression for <£, we observe the following: When Na 
and Nb are inside the range of the sum on n, namely, M — 1 ^ n ^ N 
— 1, the sum on n is limited by the windows rather than by the data. 
We define </> to be composed of all terms of (f> P k(l, m) of eq. (21) such 
that Na and Nb he inside the natural interval of the data independent 
oil — m and € to include all the remaining terms of <f> p *(Z, m). Thus 

i(l - m) - J J fad, m) (30) 

pes AeS 

f(D = I 2 />(/) (31) 

pes AeS 

€(/, m) = I £ fetft m) (32) 

p£S M0B 

8(1) =11 r pk (l) (33) 

p>£S Af£S 

define the matrix elements of </>, f , €, and fi, respectively. For <f> and f , 
the sums on p and A are over the set S(/>, k), while e and 5 are summed 
over all k and p outside S. The set S is defined by all integers p, k, as 
shown by the dots in Fig. 3, such that 

N A 2* M ~ 1 

Nb^N-1 

N a ^ N B . (34) 

Equation (34) is to be satisfied for all m and I in the range [0, M — 1]. 
In the appendix we give an explicit formula for <f>(Z — m). From Fig. 3 
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M = 32 

N' = 1024 

L = 128 

R = 32 

q, =-4 
q 2 = 4 

P A = 5 

Pb ■ 31 



35 



Fig. 3— Points comprising the set S in the (p, k) plane which are used in computing 

and r. Also shown are the regions which define e (see the appendix). 



we see that, in general, the set forms a strip which is missing small 
pieces at its ends. These small pieces are the set of p, k&S which may 
be used to compute e and 5 (see appendix). 

Because of the definition of the set S, the sum on n in eqs. (22) and 
(24) may be extended to ±<» since the windows naturally truncate the 
data to N A ^ n s£ 2V B . As a result of this definition, <£ is only a function 
oil - m and is therefore Toeplitz. Such is not the case for c(/, m), 
however, since most terms in this sum are truncated at either n = 
M— lor n = N — 1. Finally, note that the expected value of elements 
of </» or r increases linearly in N (i.e., is of order N), 

E(f) - O(N) 

E(r) = O(N), 

while the expected value of elements of € and 8 is of order 1 (i.e., 
independent of N) 
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E(5) = 0(1) 

E(e) = O(l). 

Thus, as N approaches oo, the solution of eq. (29) approaches that of 
eq. (16) at a rate of 6 dB per octave in N. 

V. TRUNCATION ERROR ANALYSIS 

A bound on the truncation errors may be obtained from eqs. (16), 
(28), and (29) in the following way. From eqs. (16) and (28), 

(4 + e)(n + A) = r + A (35) 

or 

<>n + «>A + €n + €A = r + 5. (36) 

As a result of eq. (29), 

<|>A + en + eA = S. (37) 

Next we multiply by <j> -1 and solve for A as 

A = jr l {B - en) - ^ _I cA. (38) 

Forming the norm of each side of eq. (38), we define a measure of error 
Qa [eq. (5)] which, after some algebra, may be shown to be bounded 
by (|| • || as defined here is the Euclidean norm of a vector) 



Q A = 20 logi 



20 log, 



l»- x IHI*-«ft|l 
Li-H<m-NI 



(39) 



If we let MA) denote the ith eigenvalue of the matrix A, then it can 
be shown that 

||<TMl={minA.(rt)}- 1/2 

i 

||€|| = {maxA I (€'€)} 1 / 2 



where, by definition, 



>M-\ \>/ 2 

|n|, = ( I h 2 (l)j 



/-o 



)l/2 
• 



(40) 



Useful bounds on these norms are easily obtained for € and <£~\ For 
the latter, if 4»(w*) is the 2M — 1 point dft of [<£o, <fa, • • •, 4>m-i, 4>i-m, 
• • • i ^-i]i then 
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H^-l^minUKft)*)!- 1 . (41) 

In practice, <f>(u*) and the dft of r, R(uk) are directly computed from 
the data x(n) and y{n). Qa is only of theoretical interest and would 
generally not be computed in a real problem. 

VI. IMPLEMENTATION OF THE SHORT-TIME SPECTRAL APPROACH TO 
SYSTEM IDENTIFICATION 

As a result of the previous discussion, our method is implemented as 
follows: 

(i) Form windowed data segments y p (n) and Xk(n) for each p and k 
inS. 

(ii) Compute the correlations $p k (l — m) and r P k(l) which may be 
done as follows (using fast correlation techniques): 

Xk(u) = F{x k (n)) 

Y p (a) = F{y p (n)} 

*p*(«)=Xf(«).Xp(a>) 

R pk (u)=Xt(u)Y p (u) 

fain) = rl*nM] 

f pk (n) - T-\&pk(ia)l (42) 

F[-] and F~'[»] are the Fourier transform operations and ()* defines 
conjugation. 

(Hi) Form <j>(l) and f(l) by summing over all/?, k E S as discussed in 
the appendix. (In practice, this computation is done recursively. Fur- 
thermore, the sum is best done in the frequency domain.) 

(iv) (Solution Method 1) Finally, solve the matrix equation 

<j>h = r. 

This equation may be solved by Toeplitz matrix inversion methods 
which require only M 2 operations and 2M storage locations. 

(iv') (Solution Method 2) Alternatively, under some conditions we 
may approximately find i?(w) by Fourier transforms from 



fc(u) = F 
£(«) = F 



pes AeS 

£ 2 f P k(D 

pes Aes 



fir * * (w) 

O(co) 
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4>(<o) is the autospectral estimate and $(w) is the cross-spectral esti- 
mate. They sharply differ from the classical definitions of these quan- 
tities because the cross terms p ¥* k have been included. The inclusion 
of these cross terms is responsible for the removal of the bias in these 
estimates (see the next section). The dft version of the above differs 
slightly in some of its details. 



VII. BIAS AND THE CLASSICAL CASE OF SPECTRAL ESTIMATION 

The most common method of spectral estimation is equivalent to 
forming the estimate 



A 



(43) 



where the sum on k is taken on nonoverlapping or slightly overlapping 
intervals (i.e., R = L or R = L/2). There appear to be several flaws in 
this method. First, R should be less than L/fl where fl is the time- 
bandwidth product of the window, as is required by the Nyquist 
theorem. 2 " 4 For a Hamming window, Q is 4. Second, because of the 
absence of cross terms, bias is present in the estimate as may be seen 
by inverse Fourier transforming S xy (u). For example, if in eqs. (29) to 
(31) we modify the sum on k to be k = p + q with q a fixed integer, 
then eq. (29) becomes 



M-\ 



£ h(m) £ <t> P , P+g (l - m) = £ f p , p+g (l). 



(44) 



If we define the (decimated) autocorrelation of the window by 

Ml) = I w(pR)w(pR + /), (45) 

p 

then, for a white noise input (with variance al ) and for no additive 
noise (a 2 q = 0), if we assume ergodicity, i.e., 



lim 



i N-\ 

— X x(n)x(n + m) 

IV n=0 



= ol8(m) = E[x(n)x(n + in)], (46) 



where 8(m) is one when m = and zero otherwise, the term in the left- 
hand side of eq. (44) reduces to (as N —* oo) 



lim 

N— oo 



■fi £ <i>P.P+q(l ~ m) 



= lim 

N—00 



— £ x(n - m)x(n - I) 



£ w(pR + m — n) 

p 



(47) 



•w((p + q)R + 1 - n) 

= a 2 x S(m - M w (qR + / - m) (48) 
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and the right-hand side of eq. (44) similarly reduces to 



lim 

N-.00 



Tt X fp.p+qil) 



= lim 



i I y(n)x(n - I) 

iV n =0 



£ w(p# - n)w((/> + q)R + I - n) 
p 



= olh{l)MqR + /)• 

From eqs. (44), (48), and (50), we get 

alh(l)MqR) = olh(l)MqR + D 
or 

MqR + I) 



fi(l) = h(l) 



MqR) 



(49) 
(50) 

(51) 
(52) 



Equation (52) shows that the effect of the window is to modify the 
estimate of h by the quantity \p w (qR + l)/^ w (qR). When q = [the 
classical case, eq. (43)], we have the result 



W) = HI) 



MD 



(53) 



MO) ' 

thus, h(l) is a biased version of h(l). By summing over q [i.e., sum eq. 
(44) over q], the bias is removed since ^ is a lowpass function, 
satisfying the relation [see eq. (18)] 

X MqR + I) - X MqR) - constant (54) 

and eq. (52) gives the desired result 

fi(l) = h(l), (55) 

independent of the window. 

VIII. EXPERIMENTAL RESULTS 

In this section, we give some numerical results for a simulated 
system identification problem. Using the system of Fig. 1 as the model, 
a specific fir system was chosen for h(n) with impulse response 
duration M = 7 samples. This is the simple system used in Refs. 1 and 
6, and its impulse response is given by 



n 





1 


2 


3 


4 


5 


6 


h(n) 


0.1 


0.5 


1.0 


0.5 


-0.5 


-1.0 


0.5 



The input to the system x(n) was a Gaussian noise with zero mean 
and variance a 2 x . The additive noise q(n) was an independent Gaussian 
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noise with zero mean and variance a,. In each of these examples, a 
Hamming window was used. 

Figure 4 shows a plot of Q [eq. (5)] as a function of iVforo, = (i.e., 
no additive noise) and for values of the parameter q , where q is the 
number of off-diagonal cross terms used in estimating «f» and r, as 
expressed in the form 



9o 



P2 



Q"-Qo P"Pi 
9o Pi 

r(D = 2 S r p , p+q (l). 

<7=-9o P"Pl 



(56) 



(57) 



For this example, solution method 1 [Section VI, step (iv)] was used to 
solve the matrix equation. As qo gets larger (i.e., as more cross terms 
are included), the bias is removed as seen in Fig. 4. If qo > Qmax, where 
[see eq. (70) in the appendix] 

M + L-2 



(jl max 



R 



(58) 



then no further changes occur in <£(/), r(/) (or Q). In this example, q max 
= 5. The effects of the bias for q = 0, 1, 2, and 3 are such that, for 
large values of N, Q is from 15 to 45 dB worse than for the unbiased 
estimates. For values of qo greater than 3, only small changes occur in 
Q. Thus there is a computational tradeoff — especially in cases where 
<7max is calculated using M and M is greater than M . When M ~ M and 



-10 



Q -30 



-40 



q„ = ° 




M= 16 
L = 16 
R = 4 

MATRIX SOLUTION 1 
■ • ■ ■ i 



q„ = " 



50 100 



5000 



Fig. 4 — Plots of Q versus N for M = 16, L = 16, R = 4, s/n » oo, and several values of 
qo, using matrix solution method 1 [Section VI, step (iv)]. 
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the impulse response of the system has large values near n = M, then 
values of up to <7 ma x are required for the best solutions. 

Figure 5 shows plots of Q versus N for different Hamming window 
lengths L for the matrix solution method 1 (Fig. 5a) and for the fft 
solution method 2 (Fig. 5b), for a fixed value of M = 8 and for a\ = 0. 
(Comparable results were obtained for larger values of M up to M = 
64.) It can be seen by comparing the curves of Figs. 5a and 5b that the 
Q values obtained from the matrix solutions were from 10 to 20 dB 
better than those obtained from the fft solution [Section VI, step 
(iv)] for small N (such that Q from the matrix solution was —30 dB or 
larger). For large N, the two methods of solution yielded essentially 
identical results. 



M= 8 

R = L/4 

MATRIX SOLUTION 1 



L = 16 




5000 



Fig. 5 — Plots of Q versus N for M = 8, R = L/4, s/n = », and for several values of L 
for: (a) matrix method 1 and (b) fft method 2 [Section VI, step (iV)]. 
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From Figure 5a, two additional points are noted. First, we see that 
the asymptotic behavior of the curves of Q versus N is a 6-dB-per- 
octave slope as shown by the dotted line in this figure. For small N, 
the curves deviate from this slope. Second, we see small but consistent 
differences in the curve of Q versus N for different values of L. An 
exact explanation of this result is beyond the slope of this paper; 
however, eq. (39) shows that Q& is directly proportional to the norm of 
8 - en. It can be shown that || 8 - eh || gets smaller as L increases 
(since the error terms become more highly correlated), thus Qa depends 
on L in a very complicated manner as illustrated in Fig. 5. A simple 
analysis of the effects of || 8 — eh. || on Q is as follows. If 8 and ch are 
not correlated, then 

||* -€h||« ||*|| + ||€h||. (59) 

When the two terms are correlated, 

||fi-€h||« ||*|| + ||eh||. (60) 

Equation (59) implies that, if || 8 \\ is made zero, || 8 — ch || decreases 
since one of two terms has been removed. Equation (60) implies that 
the error must greatly increase when \\8\\ = 0. Experimentally, 8 was 
set to in our numerical simulations (by computing r exactly), and it 
was observed that the error Q increased by more than 20 dB. This 
showed that eq. (59) was a bad approximation because 8 and ch were 
highly correlated. 

Figure 6 shows a direct comparison between the Q values obtained 
from the matrix solution method 1 (Section VI), the fft solution 
method 2 (Section VI), and a third method which we call the classical 
Toeplitz case (solution method 3), for M = 8 (Fig. 6a) and M = 64 (Fig. 
6b) and for a\ = 0. For the first two solution methods, a value of L = 
64 was used for the window. Method 3, the classical Toeplitz case, 5 
computes matrix elements </>r(J) and rr(l) directly from the data as 

N-i-m 
♦r(J) = S x(n)x(n + /) 



r T (D= S x(n)y(n + l) (61) 

n=M-\ 

and determines hrin) as the solution of the matrix equation 

f\>TliT = *t- (62) 

By comparing the three solution methods, it is seen that the matrix 
solution method 1 gives the smallest Q values for almost the entire 
range of N, whereas the fft method 2 gives smaller Q values than the 
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Fig. 6 — Comparisons of the curves of Q versus N for L = 64, R = 16, s/n = », for 

solution methods 1 to 3 for (a) M = 8, and (b) M = 64. Method 3 is defined in the text 
by eq. (62). 

Toeplitz method 3 for large values of N, and larger Q values for small 
N. Analysis of the classical Toeplitz case shows that this method is 
identical to the matrix solution method 1 using a rectangular window 
with a shift of R = L (i.e., an entire window shift). Figure 6 thus 
directly demonstrates that Hamming window results are better than 
rectangular window results. 

Figure 7 shows a plot of Q versus J14T for a fixed value of N = 1024 
and for several values of L. It can be seen that the curves of Q versus 
M have a slope of about 3 dB per octave in M, as predicted based on 
the definition of eq. (5). 6 However, we also see the interesting effect 
that, for certain values of Kt and for different values of L, the value of 
Q makes a discrete jump and then jumps back to its previous level 
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(approximately). This effect is due to discrete changes in the limits q\ 
and qi with L (see appendix) which is directly reflected in the curves 
of Q versus M. The point of this figure is to show the effect of the 
positioning of the windows with respect to the data. Depending on the 
exact window placement, 3 dB of difference can be expected in a 
typical case. 

As a final example, Fig. 8 shows a set of curves of Qv&N for s/n = 
8 dB, M = 16, and for several values of L. Also plotted in the figure is 
the theoretical curve 6 (dashed line) for the least-squares analysis Qlsa 
= 10 \og w [M/N] - s/n. This curve drops at a rate of 3 dB per octave 
as N increases. It is seen that the measured Q curves from the matrix 
solution method 1 are quite close (within a few decibels) to the 
theoretical curve for values of N greater than about 100. In these cases, 
as N increases beyond about 100 points, the s/n-induced errors domi- 
nate the truncation (c and 6 induced) errors. Thus (for N > 100) there 
is no advantage in solving the lsa equation since the Toeplitz solution 
is equally accurate. 

IX. DISCUSSION 

The purpose of this paper has been to focus on the problems of 
system identification and spectral estimation and to investigate tech- 
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Fig. 7— Plots of Q versus M for N = 1024, R = L/4, s/n = oo, and several values of L 
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Fig. 8 — Plots of Q versus NfotM= 16, s/n = 8 dB, and several values of L for matrix 
method 1. The dashed line is the theoretical lsa result for Q as derived in Ref. 6 (see 
text). 



niques based on the short-time spectrum. We have shown that the 
problem of system identification can be solved by expanding each term 
of the least-squares solution in terms of short-time signals. By carefully 
examining and partitioning the terms entering into the computation, 
the system identification problem was approximately transformed 
from a general positive definite matrix inversion problem (the least- 
squares normal equations) to a Toeplitz matrix inversion problem with 
an error which was bounded and of order 1/N. The terms which 
entered into the Toeplitz problem were identified as estimates of the 
power spectrum of the input and the cross power spectrum of the input 
and output. It was shown that the individual spectral estimates were 
unbiased — that is, they were independent of the window used to make 
the estimates. 

Most of this paper has dealt with theoretical and numerical inves- 
tigations of the properties of the resulting spectral estimates and their 
effect on the system identification solution. We have shown the follow- 
ing to be true: 

(i) The spectral estimates F(<j>(/)) and F(r(Z)) are unbiased. 

(«) The quality Q of the system identification estimate improves at 
a rate of 6 dB per octave as N increases for sufficiently large N and 
small additive noise (data-limited region). 

(Hi) The quality Q improves at a rate of 3 dB per octave as N 
increases for sufficiently large N and large additive noise (noise-limited 
region). 

(iv) The resulting method approximates the least-squares normal 
equation by a Toeplitz equation, which is more accurate than the 
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standard Toeplitz approach which uses rectangular windows of dura- 
tion equal to the total data length. 

(v) The quality Q improves at a rate of 3 dB per octave as M 
decreases, if M > M. 

The key issue that remains to be discussed is the possible advantages 
and disadvantages of the short-time spectral approach [compared to 
alternative procedures such as the lsa or lms (least-mean-squares 6 
method)]. The main advantages of this method are: 

(i) Implementation of the spectral estimates is straightforward and 
is readily performed using ffts. 

(ii) The resulting matrix equation can be solved directly using a 
Levinson 8 or Trench 9 inversion, or approximately via fft methods. 

(Hi) The resulting solution has good asymptotic properties with 
respect to the variables N, M, and s/n. 

(iv) The method is easily amenable to adaptive procedures based 
on short-time spectral estimates. This property could be useful for 
systems where the additive noise is nonstationary — e.g., burst noise or 
corrupting speech. 

(v) The method might potentially be applied to problems where M 
is on the order of 1000. 

The disadvantages of the method are: 

(i) The quality factor Q for the noiseless case for data lengths 
comparable to the impulse response duration (N ~ M) is significantly 
worse than that obtained from the lsa method. 

(ii) A Toeplitz matrix equation must still be solved to obtain the 
highest accuracy solutions. However, it should be pointed out that the 
time required to invert a Toeplitz matrix is usually much less that the 
time required to compute it (i.e., when N » fit). 

The final assessment of the utility of this, or any other spectral 
estimation or system identification method, is its applicability to a real 
world problem. One natural application for this method is the echo 
cancellation problem 10 where the impulse responses are long and 
possibly time- varying and additive nonstationary noise is present. We 
anticipate applying our technique to this problem as a more stringent 
test of its capabilities. 

X. SUMMARY 

In this paper, we have discussed a class of system identification and 
spectral estimation methods based on the short-time spectral repre- 
sentation of signals. We have discussed the properties of these methods 
and illustrated them with some simple examples. Our conclusion is 
that, for some applications, this new method provides a practical 
alternative to the classical least-squares analysis method. We would 
like to thank I. W. Sandberg for discussion on Section V. 
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APPENDIX 

We derive here an explicit formula for <j> and €. The constraints on 
Na and Nb are 

N A 2* M - 1 (63) 

Nb^N-1 (64) 

Na^Nb. (65) 

The equation for <j>(Z - m) is 

<f(/-m)= X X <f> p , p+g (Z,m), (66) 

9=9l P=P\ 

where <b PJ) +q{l, m) is given by eq. (22) with k = p + q. Constraint (63) 
gives pi, (64) gives P2, and (65) gives qi and 92 as follows. From eqs. 
(63) and (26), 

pR^M+L-2- max(m, qR + I). 

Since we wish this to hold for all lags (m, /) on [0, M — 1], it must hold 
for m = I = 0, the values which give the greatest p value for which the 
inequality holds. Thus 



, v [M + L-21 te% . 

pdq) = p - max(0, 9). 



(67) 



The functions \x"\ and |xj are called ceiling(x) and floor(x). They 
are defined by truncation to the next integer above and below x, 
respectively. For example, [WJ = 4, \.—ir\ = —4, [0.5] = 1, etc. For any 
x, \x] = — [-x\, [x + n\ = LxJ + n, and \x + n] = \x~\ + n, where n is 
an integer. 

Constraint (64) gives />2 with m = l — M— las the worst case. Using 
eq. (27), 

LN- M I 
— — - min(0, q). (68) 

Finally from constraint (65) and eqs. (26) and (27), 

i ,-i+t-i i (70) 



l m-7+L-l I 



Since p t and p-i are functions of q, the p(^) sum must be done first 
as shown in (66). Thus (66) to (70) completely specify <£. 

Next we give formulas for e. e has two components, as may be seen 
in Fig. 3, which we call e+ and e_. Then e = e+ + €-, where 
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with 



e-(l,m)= £ I <hkd,m) (71) 



P3 = \?LlLzji\ < 72) 

fc.^.f^Zi]-! (73, 

fc=M (74, 



and 



P6 *6 

e+ (/,m)= J X «fc*tt»») (75) 

p=p 5 *=A 5 



*6-p 8 - — =— + 1 < 76 > 



I N-2+L-m I 

"-L — s — J 

. I N-2+L-1 I 

*-L — s — J- 



(77) 
(78) 
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